# Hybrid Type CAM Design for Both Power and Performance Master Slave Match Line

Loganathan.K PG Scholar, Nandha Engineering College, Erode, India.

## Prem Kumar.P Assistant Professor, Nandha Engineering college, Erode, India.

Abstract - Content-addressable memory (CAM) is used to compare the search data with all the stored data in parallel. Due to the parallel comparison feature where a large amount of transistors are active on each lookup, however, the power consumption of CAM is usually considerable. There are two conventional CAM designs, i.e., NOR-type and NAND-type CAMs. The NOR-type CAM provides the best search performance, but its cost is a large amount of power consumption. In contrast, the NAND-type CAM trades the search performance for a low-power feature. In Master - Slave match line design, we used NOR type CAM cell. So that the power consumption is quite high. So, in this work presents a hybrid-type CAM design which aims to combine the performance advantage of the NOR-type CAM with the power efficiency of the NAND-type CAM. In our design, a CAM word is divided into two segments, and then all the CAM cells are decoupled from the match line. By minimizing both the match line capacitances and switching activities, our design can largely reduce the power consumption of CAM. Because the hybrid type CAM provides a fast pull-down path to speed up the light weight match line discharge.

Index Terms – Content-addressable memory (CAM), hybrid-type CAM, NAND-type CAM, NOR-type CAM.

#### 1) INTRODUCTION

Content addressable memory (CAM) is a storage that is addressed by the content (or data) rather than the memory address. It is widely used in many applications that require fast table lookup. Due to the frequent lookup and the parallel comparison feature where a large amount of transistors and wires are active on each lookup, the power consumption of CAM is usually considerable. In the CAM memory, the match lines and search lines are the major power consumers. The ML is long wire with large capacitance, and every search will cause a large amount of ML switching activities. Thus, the ML power consumption is very large. The MLs contribute 65%–88% to the total ternary CAM power consumption.

Traditionally, there are two ML architectures, NOR-type ML and NAND-type ML. The NOR-type ML provides the best search performance, but it costs a large ML power consumption. In contrast, the NAND-type ML trades the search performance for low-power feature. From the related work, the ML power consumption can be reduced by several methods, including the ML segmentation, pipelining search scheme, reducing the ML voltage swing, and so on. In this paper, we propose a new ML architecture, called master–slave ML. The key concept of the MSML design is to combine the master–slave architecture and charge sharing technique to reduce the CAM power dissipated in the ML switching.

The features of the MSML design are as follows:

- 1) In our design, unnecessary to discharge all the bit lines to prevent the unexpected short-circuit power consumption. The power dissipated in bit line switching activities can be effectively reduced.
- 2) Because all the CAM cells are decoupled from the ML match line is lightweight. Moreover, only the matched words would discharge the match line from high to low. The match lines are not the major power consumer any more in our design.
- 3) The hybrid-type CAM provides an additional fast pull- down path to speed up the match line discharge in case of a word match. Independent of the segment size, this fast path ensures that the high search performance can be realized.
- 4) Because a level restore path is added to the match line, our design has the immunity from the false match incurred by the possible race condition.



Fig. 1. Typical CAM cell. (a) XOR type. (b) XNOR type



## 2) CONTENT-ADDRESSABLE MEMORY

The CAM consists mainly of the CAM cells. Fig. 1(a) shows a typical XOR CAM cell that consists of two parts: 1) one for storing data, called store unit; and 2) the other for comparing data, referred to as compare unit. The store unit is usually implemented as the traditional 6T SRAM cell that contains a cross coupled inverter pair. The compare unit is a pass-transistor logic (PTL) for comparing the stored with search data. Depending on the different applications, the NOR compare unit can be modified as XNOR logic. Besides the store and compare units, a pull-down transistor *M*, which is gate-controlled by the comparison result, is necessary to connect/disconnect the ML to/from the ground.

#### A.NOR-Type CAM

Traditionally, there are two types of CAM designs. As shown in Fig. 2, one is NOR-type and the other is NAND-type. In NOR-type CAM design, the CAM cell is usually XOR-type, and the pull-down transistors of each CAM cell are arranged in NOR type. The match line is initially pre charged to high. For a CAM word, if one or more cells are mismatched, the match line would be discharged to 0.only when all cells are matched, i.e., the search data are identical to the stored data, the match line can retain logic high in the pre charge phase. Because the pull-down path is very short, in case of a mismatch the match line is discharged to quickly. Thus, the NOR-type CAM provides the best search performance. Note that the pull-down transistors arranged in NOR type is beneficial for search performance, but they contribute a lot of drain capacitances to the match line.

## B. NAND-Type CAM

In contrast to the NOR-type CAM, the NAND-type CAM aims to reduce the power dissipated in search operation, in which the CAM cell is implemented as XNOR-type instead of XOR-type, and the pull-down transistors of each CAM cell in the same word are arranged in NAND type, as shown in Fig. 2(b). The match line is initially pre charged to high, and discharged to 0 only when all CAM cells are matched. Because the load capacitance of match line is small and only one match line is discharged to 0 during a search, the power consumption is minimal. However, the pull-down path is too long, such that the match line discharge is very slow in case of a match. Thus, the NAND-type CAM trades the performance degradation for a large power saving.

## 3) HYBRID-TYPE CAM DESIGN

#### A. Overview

The key idea behind our design is to combine the performance advantage of NOR-type CAM with the power efficiency of NAND-type CAM. As shown in Fig. 3, we divide a CAM word into two segments, i.e., SEG\_1 and SEG\_2, and the necessary control circuitry. In the SEG\_1, the CAM cell is implemented as XNOR-type and their pull-down transistors are arranged in the NAND type, denoted as NAND-type block in Fig. 3. The NAND-type block is connected to the ground only when all the CAM cells of SEG\_1 are matched. In contrast to SEG\_1, we use the XOR-type CAM cell to implement the SEG\_2, and their pull-down transistors placed in the NOR type, denoted as NOR-type block in Fig. 3.



Fig. 3. Word structure of the hybrid-type CAM design.

Table 1 Key Node Voltage (H/L) and Path Connection/Disconnection (O/X) For Each Case in the Hybrid-Type Cam Design

|  |        | SEG 1    | SEG_2    |    | Path |    |    | Key Node |    |          |
|--|--------|----------|----------|----|------|----|----|----------|----|----------|
|  |        | SEG_I    | SEG_2    | T1 | T2   | T3 | M1 | M2       | ML | Result   |
|  | Case 1 | mismatch | mismatch | Х  | Х    | Х  | Н  | Н        | Н  | mismatch |
|  | Case 1 | mismatch | match    | Х  | Х    | Х  | Н  | Н        | Н  | mismatch |
|  | Case 2 | match    | mismatch | 0  | 0    | 0  | L  | L        | Н  | mismatch |
|  | Case 3 | match    | match    | 0  | 0    | Х  | L  | Н        | L  | match    |

#### **B.Search** Operation

Similar to the traditional CAM, in our design there are two phases during a search. They are *pre charge* and *match* 

## International Journal of Emerging Technologies in Engineering Research (IJETER) Volume 4, Issue 4, April (2016) www.ijeter

www.ijeter.everscience.org

*evaluation* phases, respectively. In the pre charge phase, all the match lines are first pre charged to high, and then in the match evaluation phase only the matched words would change the logic level of the match line from high to low.

1) Pre charge Phase : In this phase, the control signal PRE is low. Thus, the match line (ML) is initially pre charged to high. Because the pull-down path T1, T2 and T3 are disconnected by *N1*, *N2*, and *N3* transistors, respectively, both M1 and M2 nodes pre charged to high via *P1* and *P2*.

2) Match Evaluation Phase: After the pre charge phase, the control signal PRE is asserted high and the search data have to be loaded on the bit lines to start the matching process. This phase is called *match evaluation phase*. Because we divide a CAM word into two segments, i.e., SEG\_1 and SEG\_2 as shown in Fig. 3, depending on the match results of each segment there are four possible in the match evaluation phase

Case 1: SEG\_1 Is Mismatched and SEG\_2 Is Mismatched/ Matched: Because SEG\_1 is a mismatch, in the NAND-type block at least one NMOS transistor is turned off that disconnects the pull-down path **T1** from the ground. Therefore, node **M1** retains high that turns off the tail transistor N2 and N3 to disconnect the pull-down path **T2** and **T3**. This implies that no matter whether SEG\_2 is a match or mismatch, node **M2** is still high to turn on N4. Because the path **T1** and **T2** are disconnected from the ground, the match line **ML** would maintain logic high as in the pre charge phase.

#### Case 2: SEG\_1 Is Matched and SEG\_2 Is Mismatched:

Because SEG\_1 is a match, in the NAND-type block all NMOS transistors are turned on that connects the path **T1** to ground. Therefore, node **M1** is discharged to 0 that turns on the tail transistor *N2* and *N3*. As shown in the waveform of Fig. 5, during the pre charge phase **M1** would be pre charged to high, and then discharged to 0 through the connected **T1** path during the evaluation phase.



Fig. 4. HSPICE waveform for Case 1, in which SEG\_1 is mismatched.



In this case, because SEG\_2 is a mismatch, in the NOR-type block at least one NMOS is turned on that connects the pulldown path T3 to the ground. Thus, M2 node is discharged to0, as illustrated in Fig. 5 in which Cell with logic high indicates SEG\_2 is mismatched. Because M2 is low, N4 transistor is turned off to prevent the match line from discharging to 0 through path T1 and T2. Therefore, ML is still high to indicate this word is mismatched. Note that there is a small pulse marked with the circle in Fig. 5. It is incurred by the race condition problem which is likely to cause a false match and will be discussed in Section IV-A.

Case 3: SEG\_1 is Matched and SEG\_2 Is Matched: Similar to Case 2, in this case M1 node is also discharged to 0 to turn on the tail transistor N2 and N3. Because SEG\_2 is a match in this case, in the NOR-type block all

NMOS transistors are turned off that disconnects the pull down path T3 from the ground. Thus,M2 node still retains logic high as in the pre charge phase, as shown in Fig. 5 in which Cell with logic low implies SEG\_2 is matched. Consequently, N4 is turned off and N4 is turned on to discharge the match line to 0 through the pull-down path T1 and T2. That indicates a real match.

Note that in our design we provide two pull-down paths, i.e., T1 and T2, to discharge the match line. Because the length of T1 path depends on the SEG\_1 length, the discharge delay of T1 will increase with the SEG\_1 length. In contrast, the length of T2 path is fixed one NMOS transistor. It is independent of the SEG\_1 length and must be shorter than T1. Therefore, T2 is a fast discharge path to ensure our design has comparable search performance to the traditional NOR-type CAM.

#### C. Related Work

There has been much previous work on CAM power reduction. Because our design would divide a CAM word into two segments, we only focus on the work related to the word segmentation techniques. In, Zukowski et al. introduced a selective pre charge technique to reduce the match line power consumption by breaking a CAM word into two stages. A small subset of CAM cells can be used to do a pre calculation, and the result is used to decide if the match line needs to be pre charged at all, i.e., conditional (or selective) pre charge. A similar CAM word structure, called static divided word match line, was proposed in. Besides segmenting the match line, their work uses static circuit design to improve the reliability. In addition, a new CAM cell with single bit line was introduced. The single bit line design requires only one heavy loading bit line, and prevents the frequent switching. Therefore, the proposed static design can further reduce the CAM power dissipated in the bit line switching activities.

By comparing our design to the techniques the major differences are summarized as follows.

Unlike, in which the ML pre charged conditionally, the match line is always pre charged in our design, and then it is discharged conditionally.

Because we decouple all CAM cells from the match line, the match line of the hybrid-type CAM is lightweight. In addition, we further provide a fast pull-down path to discharge the lightweight match line quickly. Therefore, the search performance of our design is better than both and, in which the match line is still heavyweight and the selective pre charge will result in a modest delay penalty.

In the selective pre charge technique, the additional clock phases are critical to perform the correct search operation, which increases both the complexity and power consumption of clock. In contrast, our design needs no additional timing control signals.

In the static divided word match line, the single bit line design is indeed effective in reducing the bit line power consumption, but it will result in the write problem [10], that is, it is considerably difficult to write the cell state from low to high in the single bit line configuration. The possible solution is to provide a specialized write port or modify the cell circuit.

Both methods would increase the transistor count, and thus the power consumption of cell. In addition, there is a short-circuit path in the static divided method if the first segment is matched and the second segment is mismatched. In contrast, our design is free from the short-circuit path in all possible cases. An adaptive serial-parallel CAM, called SPCAM, is another lowpower CAM structure. Besides dividing a CAM word into two segments, SPCAM can operate in either parallel or serial mode. In serial mode, the energy consumption is almost a quarter of the conventional parallel CAM, but the performance degradation is about 25%. In parallel mode, without any performance penalty the energy consumption is still 33% better than the conventional parallel CAM. By comparison, the search performance of our design is much better than the SPCAM operating in serial mode, and the power reduction of our design is larger than the SPCAM operating in parallel mode where both the segments are active.

#### 4) IMPLEMENTATION ISSUES

Depending on the application, user can adjust the length of SEG\_1. If the length of a CAM word is bits and the length of SEG\_1 is bits, then the length of SEG\_2 would be bits. In the SEG\_1, because all the pull-down transistors are arranged in serial mode (i.e., NAND-type block), and they are on the critical path to discharge the match line, the length of SEG\_1 is a powerful lever on the functionality, performance and power efficiency in our design.

#### A. SEG\_1 Length Versus Race Condition

From Fig. 3, we note that the speed of **M1** discharge depends on the length of SEG\_1. This implies that there is a possible *race condition* problem in case 2, i.e., SEG\_1 is matched & SEG\_2 is mismatched: a) If the **M1** discharge is fast enough, then the tail transistor *N3* would be turned on quickly to discharge **M2**, such that *N4* transistor is turned off quickly to prevent the match line from discharging; therefore, the logic high level of match line can be retained correctly; and b) in the other case, if the **M1** discharge is too slow to prolong the on time of *N4* transistor, then the match line would be discharged unexpectedly; if the voltage level of match line is too low, then it is a false match.

To prevent the incorrect match incurred by the race condition, we add a PMOS transistor, *N4*, to provide the level-restore capability. Once the **M2** node is discharged to 0, regardless of discharge speed, *N4* transistor would be turned on to supply the lost charge. Consequently, our design provides the immunity from the potential race condition problem. This effect can be observed from Fig. 5, in which there is a small pulse marked with the circle. The lost charge would be supplied quickly.

#### B. SEG\_1 Length Versus Charge Sharing

If the length of SEG\_1 is too long, the *charge sharing* problem would possibly occur when SEG\_1 is mismatched and SEG\_2 is matched. As shown in Fig. 6, the worst case is that all the pull-down transistors are turned on but the most left one. In this case, the charge of **M1** node would be shared among the intermediate nodes,  $\lambda i 0 \sim i5$  such that the voltage level of **M1** node is decreased. Because SEG\_2 is matched, *N4* is turned on to discharge the match line. If the voltage level of match line is too low, then it results in a false match.



Fig. 6. Example of the charge sharing problem incurred by large SEG\_1.

### C. SEG\_1 Length Versus Power Saving

As described above, short SEG\_1 can prevent the charge sharing problem, but it increases the probability of **M1** discharge. Suppose, for example, that the length of SEG\_1 is one bit. For a random pattern, the probability of **M1** discharge would be 50% on average, i.e., the probability of tail transistor N3 turned on is also 50%. Because there are pull-down transistors in the NOR-type block, the probability of **T3** path connected to the ground would increase largely. It results in a significant power dissipated in the discharge of the **M2** node with large drain capacitances. Ideally, the probability of **T1** path conducting, i.e., (**T1** conducting), and the probability of **T3** path conducting, i.e., (**T3** conducting), as shown in the following equation:

 $p(\mathbf{T1} \text{ conducting}) \times p(\mathbf{T3} \text{ conducting})$ 

$$= \left(\frac{1}{2}\right)^x \times \left(1 - \left(\frac{1}{2}\right)^{n-x}\right) = \left(\frac{1}{2}\right)^x - \left(\frac{1}{2}\right)^n$$

where and are the lengths of the entire word and  $SEG_1$ , respectively. In this equation, we assume that the match probability is 1/2 for each CAM cell. In the  $SEG_2$ , because all the pull-down transistors are arranged in the NOR type, the

**T3** path is disconnected only when they are all turned off. Thus, (**T3** conducting) is equal to  $(1 \ (1/2)^{n-x})$ . Fig. 8 shows the probability of **M2** discharge for various SEG\_1 lengths, in which is assumed. Clearly, the probability of **M2** discharge decreases sharply as the length of SEG\_1 is increased. This implies that the search operation would consume more power when we decrease the length of SEG\_1.

## 5) EXPERIMENTAL RESULTS

In this paper, we use TSMC  $0.18_L$  m 1P6M technology to implement the proposed design. Fig. 9 shows the layout block diagram and the microphotograph of the fabricated hybrid-type CAM chip, where the shift registers are used for the function verification. Note that the core was broken into four blocks for both the performance and power efficiency. For a substantial comparison, besides the conventional NOR-type and NANDtype CAM, we also implement the related designs, including the selective pre charge scheme, the static divided word structure, and the SPCAM that operates in serial and parallel mode. They are denoted as SP, SDW, SPCAM S, and SPCAM P, respectively. All CAM designs are with size of 128 32, i.e., 128 words by 32 bits, and the data presented in the following discussion are obtained from the HSPICE post layout simulation.



Fig .7. simulated output for match



Fig. 8. Simulated output for Mismatch

The TSPICE simulation results show that the proposed MSML design is suitable to the cases with large 4 bit word size. S1, S2, S3 and S4 are the 4-bit search line input data. ML1, ML2, ML3 and ML4 are the match line output. By minimizing the MML charge loss, the MSML design can largely reduce the ML energy consumption. Unlike the most related work, where the power saving depends on the occurrence of best case, in the MSML design at least 50% ML power saving is guaranteed theoretically. The observation is that, in a CAM, when a search is done, all addresses are accessed for comparison, which is Completely different from a standard memory in which only the selected address is accessed. The contents of the CAM are commonly referred to as keys, and the input is compared against the stored keys to find a match. The input goes into the CAM through the search lines (SLs) and is compared in parallel with all the keys. An encoder returns the address of the key that matches the input. That address is then typically used as the address to access another memory where a value associated with that key is stored.

#### A. Performance

In this paper, the metric used to evaluate the CAM performance is the *match delay*, which is defined as the elapsed time from signal PRE is asserted high to the match line discharged to 0 in case of a match. Table II lists the match delay of all CAM designs where the SEG\_1 length is varied from 1 to 6 bits. Due to no segmentation, the match delay of the NOR-type and NANDtype CAM are fixed at 0.641 ns and 2.774 ns, respectively. In other words, the match delay of NAND-type CAM is4.3 times larger than that of NOR-type CAM. As indicated in the background and [5], the NAND-type CAM is not a feasible solution because of its long match delay. Fig. 10 shows the normalized match delay where the match delay of all CAM designs are normalized to that of the NOR-type CAM. From this figure, we summarize the most important aspects as follows.

- (1) It is clear that the search performance of our design is better than the other word segmentation techniques. Particularly, only the hybrid-type CAM (with SEG\_1 length ≤) has better search performance than the conventional NOR-type CAM. Because the match line is pre charged conditionally, the search performance of SP, SDW, and SPCAM is worse than that of NORtype CAM. Note that the match delay of SPCAM is almost the same as that of NOR-type CAM. This is because in SPCAM both SEG\_1 and SEG\_2 are always active concurrently for high search performance.
- (2) The SEG\_1 length has a significant impact on the search performance for all word segmentation techniques except for SPCAM and SPCAM, in which theSEG\_1 is broken into several sets of two bits to limit the number of transistors in series to three. In our de- sign the

match delay increases with the length of SEG\_1. As shown in Fig. 3, the match line discharge relies on the **M1** discharge that connects **T1** and **T2** paths to the ground; and further, **M1** discharge delay increases with the number of transistors in the NAND-type block.

(3) One interesting observation from this result is that when the SEG\_1 length is less than or equal to four bits, the match delay of our design is even shorter than that of the conventional CAM. This is because our design decouples all CAM cells from the match line, such that the match line is lightweight. Once the fast path T2 is connected, it can discharge the lightweight match line quickly. Although the NAND-type block would degrade the match performance slightly, the fast path T2 can compensate for the performance loss. Due to the charge sharing problem, the length of SEG\_1 is constrained within four bits. From the detailed data shown in Table II, if the length of SEG\_1 is 4, the match delay is 0.609 ns. Compared to the conventional NOR-type CAM, our design can improve the search performance.

## B. Power and Energy

Table III shows the power consumption during a search for all CAM designs where the SEG\_1 length is varied from 1 to 6 bits. Clearly, the search power consumption can be reduced sharply when the SEG\_1 length is increased except for SPCAM, in which SEG\_1 and SEG\_2 are checked concurrently for high search performance. No matter whether SEG\_1 is match or not, SEG\_2 is always active on every search, such that the search power of SPCAM is slightly less than that of NOR-type CAM. In the proposed hybrid-type CAM, the search power consumption is roughly 0.29 mW as the SEG\_1 length is 6 bits. Compared to the NOR-type CAM, whose search power consumption is fixed at 3.04mW, our design can reduce roughly 90% of the search power consumption.

Besides reducing the search power consumption, increasing SEG\_1 length has a large impact on the search performance as revealed in Section V-A. Consequently, increasing SEG\_1 length is a tradeoff between power and performance. For a fair comparison, energy is a suitable metric, which is the product of the match delay (performance) and search power (power). Combining Tables II and III, the detailed energy results for all CAM designs are summarized in Table IV, and Fig. 11 shows the normalized search energy, in which all energy results are normalized to that of the NOR-type CAM design. From Fig. 11, it is clear that the word segmentation techniques are indeed effective in reducing the energy consumption of CAM except for SPCAM P [7] where both SEG 1 and SEG 2 are always active for high search performance. Actually, the search energy of both SPCAM P [7] and NOR-type CAM are almost the same. The major advantage of our design is that it not only largely reduces the search power, but also improves the match delay. In contrast, SP [5], SDW [6] and SPCAM\_S [7] would

result in a delay penalty while reducing the search power. Consequently, our design can achieve the most energy improvement compared to the other related techniques. In addition, the energy efficiency of our design is even better than that of the NAND-type CAM which has the lowest search power. From Table IV, our design can reduce the energy consumption of NOR-type CAM by 90% as the SEG\_1 length is 6 bits. The improvement difference between and SEG is marginal. However, if the SEG\_1 length is larger than 4 bits, due to charge sharing a possible false match does exist in our design. For a reliable system, when the SEG\_1 length is 4 bits, our design can reduce the energy consumption of NOR-type and NAND-type CAM by roughly 88% and 40%, respectively.

## TABLE II

#### MATCH DELAY OF ALL CAM DESIGNS

| Delay (ns) | Hybrid | SP [5] | SDW [6] | SPCAM_S [7] | SPCAM_P [7] | NOR   | NAND  |
|------------|--------|--------|---------|-------------|-------------|-------|-------|
| 1          | 0.347  | 0.783  | 0.720   | 0.731       | 0.652       | 0.641 | 2.774 |
| 2          | 0.483  | 0.763  | 0.730   | 0.738       | 0.655       | 0.641 | 2.774 |
| 3          | 0.562  | 0.740  | 0.750   | 0.740       | 0.652       | 0.641 | 2.774 |
| 4          | 0.609  | 0.722  | 0.768   | 0.738       | 0.651       | 0.641 | 2.774 |
| 5          | 0.663  | 0.720  | 0.775   | 0.736       | 0.649       | 0.641 | 2.774 |
| 6          | 0.685  | 0.740  | 0.792   | 0.732       | 0.646       | 0.641 | 2.774 |

TABLE III SEARCH POWER CONSUMPTION OF ALL CAM DESIGNS

| Power (mW) | Hybrid | SP [5] | SWD [6] | SPCAM_S [7] | SPCAM_P [7] | NOR    | NAND   |
|------------|--------|--------|---------|-------------|-------------|--------|--------|
| 1          | 1.3318 | 2.3823 | 2.0788  | 1.8788      | 3.0541      | 3.0400 | 0.1315 |
| 2          | 0.6617 | 1.2988 | 1.1313  | 1.0313      | 2.9842      | 3.0400 | 0.1315 |
| 3          | 0.4333 | 0.7638 | 0.6529  | 0.5529      | 2.9562      | 3.0400 | 0.1315 |
| 4          | 0.3554 | 0.5219 | 0.4259  | 0.4059      | 2.9431      | 3.0400 | 0.1315 |
| 5          | 0.3083 | 0.3761 | 0.3213  | 0.3013      | 2.9182      | 3.0400 | 0.1315 |
| 6          | 0.2889 | 0.3154 | 0.3031  | 0.3123      | 2.9029      | 3.0400 | 0.1315 |

## TABLE IVSEARCH ENERGY OF ALL CAM DESIGNS.

| Energy (pJ) | Hybrid | SP [5] | SWD [6] | SPCAM_S [7] | SPCAM_P [7] | NOR    | NAND   |
|-------------|--------|--------|---------|-------------|-------------|--------|--------|
| 1           | 0.4621 | 1.8653 | 1.4968  | 1.3734      | 1.9913      | 1.9486 | 0.3647 |
| 2           | 0.3196 | 0.9905 | 0.8259  | 0.7612      | 1.9547      | 1.9486 | 0.3647 |
| 3           | 0.2435 | 0.5652 | 0.4897  | 0.4091      | 1.9274      | 1.9486 | 0.3647 |
| 4           | 0.2164 | 0.3766 | 0.3271  | 0.2996      | 1.9160      | 1.9486 | 0.3647 |
| 5           | 0.2044 | 0.2708 | 0.2490  | 0.2217      | 1.8939      | 1.9486 | 0.3647 |
| 6           | 0.1979 | 0.2334 | 0.2401  | 0.2286      | 1.8753      | 1.9486 | 0.3647 |

#### TABLE V

#### NORMAL AND PEAK POWER CONSUMPTION FOR ALL CAM DESIGNS WHERE SEG\_1 LENGTH IS 4

|              | Hybrid | SP [5] | SWD [6] | SPCAM_S [7] | SPCAM_P [7] | NOR    | NAND   |
|--------------|--------|--------|---------|-------------|-------------|--------|--------|
| Normal Power | 0.3554 | 0.5219 | 0.4259  | 0.4059      | 2.9431      | 3.0400 | 0.1315 |
| Peak Power   | 2.8740 | 5.0711 | 4.4311  | 3.7954      | 2.9847      | 3.0580 | 0.2050 |

- In the conventional NOR-type CAM, the worst-case data pattern is that all the CAM words are mismatched. The difference between normal power consumption and peak power consumption is only the power dissipated in the switch of one match line. As shown in Table V, the difference is negligible.
- 2) The worst case of NAND-type CAM is that one CAM word is matched and others are mismatched in the MSB

CAM cell. Because the mismatched words would incur the worst case of charge sharing, the peak power consumption is roughly 1.5 times as large as the normal power consumption in the NAND-type CAM.

3) For all word segmentation techniques, including our design, the worst case is that all the CAM words are matched in SEG\_1 and mismatched in SEG\_2. Unlike the NORtype CAM, all word segmentation techniques would enlarge the difference between the normal and peak power consumption.

As shown in Table V, the peak power consumption of the hybrid-type CAM is 2.874 mW which is roughly eight times as large as the normal power consumption. This is because all the **M2** nodes with large drain capacitances are discharged to 0 and then pre charged to V in the worst case. Compared to the NOR-type CAM, although the hybrid-type CAM largely amplifies the difference between peak power and normal power, it still achieves a 6% reduction in peak power consumption.

#### 6. CONCLUSIONS

This paper, we have developed a hybrid-type CAM design, in which we decouple all the CAM cells from the match line, and provide a fast path to accelerate the search operation. With a marginal area overhead, our design not only largely reduces the search power consumption but also improves the search performance.

#### REFERENCES

- H. Noda *et al.*, "A cost-efficient high-performance dynamic TCAM with pipelined hierarchical searching and shift redundancy architecture," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 245–253, Jan. 2005.
- [2] B. Agrawal and T. Sherwood, "Ternary CAM power and delay model: Extensions and uses," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 5, pp. 554–564, May 2008.
- [3] Y.-J. Chang, "Two-layer hierarchical matching method for energyefficient CAM design," *Electron. Lett.*, vol. 43, no. 2, pp. 80–82, Jan. 2007.
- [4] Y.-J. Chang and Y.-H. Liao, "Hybrid-type CAM design for both power and performance efficiency," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 16, no. 8, pp. 965–974, Aug. 2008.
- [5] C. A. Zukowski and S.-Y. Wang, "Use of selective precharge for lowpower content-addressable memories," in *Proc. IEEE Int. Symp. Circuits Syst.*, Jun. 1997, pp. 1788–1791.
- [6] K. H. Cheng, C. H. Wei, and S. Y. Jiang, "Static divided word matching line for low-power content addressable memory design," in *Proc. Int. Symp. Circuits Syst.*, May 2004, pp. 629–632.
- [7] A. Efthymiou and J. D. Garside, "An adaptive serial-parallel CAM architecture for low-power cache blocks," in *Proc. Int. Symp. Low Power Electron. Design*, 2002, pp. 136–141.
- [8] B. D. Yang and L. S. Kim, "A low-power CAM using pulsed NAND-NOR match-line and charge-recycling search-line driver," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1736–1744, Aug. 2005.
- [9] D. S. Vijayasarathi, M. Nourani, M. J. Akhbarizadeh, and P. T. Balsara, "Ripple-precharge TCAM: A low-power solution for network search engines," in *Proc. Int. Conf. Comput. Design*, Oct. 2005, pp. 243–248.
- [10] S.-H. Yang, Y.-J. Hung, and J.-F. Li, "A low-power ternary content addressable memory with Pai-Sigma matchlines," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 10, pp. 1909–1913, Oct. 2012.

- [11] G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda, "200 MHz/200 MSPS 3.2 W at 1.5 V Vdd, 9.4 Mbits ternary CAM with new charge injection match detect circuits and bank selection scheme," in *Proc. IEEE Custom*
- Integr. Circuits Conf., Sep. 2003, pp. 387–390.
  [12] I. Arsovski, T. Chandler, and A. Sheikholeslami, "A ternary content-addressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme," *IEEE J. Solid-State Circuits*, vol. 38, no. 1, pp. 155–158, Jan. 2003.
- [13] S. Baeg, "Low-power ternary content-addressable memory design using a segmented match line," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 6, pp. 1485–1494, Jul. 2008.
- [14] K. Pagiamtzis and A. Sheikholeslami, "A low-power content-addressable memory (CAM) using pipelined hierarchical search scheme," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1512–1519, Sep. 2004.
- [15] N. Mohan and M. Sachdev, "Low-capacitance and charge-shared match lines for low-energy high-performance TCAMs," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 2054–2060, Sep. 2007.